Skip to content

refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform#5674

Draft
pv-nvidia wants to merge 2 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-fused-compose
Draft

refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform#5674
pv-nvidia wants to merge 2 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-fused-compose

Conversation

@pv-nvidia
Copy link
Copy Markdown
Contributor

@pv-nvidia pv-nvidia commented May 18, 2026

Problem

FabricFrameView had duplicated kernel-launch logic in set_world_poses and set_scales, and the initial USD→Fabric sync called both methods sequentially. This meant:

  1. Duplicated code — the same 20-line kernel launch pattern appeared twice with minor variations (positions/orientations vs scales).
  2. Double PrepareForReuse — the initial USD→Fabric sync in _sync_fabric_from_usd_once called set_world_poses then set_scales, each invoking PrepareForReuse. A second non-idempotent PrepareForReuse call could mask a topology-change signal that should have triggered a fabricarray rebuild.

Solution

Extract _compose_fabric_transform(positions=None, orientations=None, scales=None, indices=None) — a single method that composes any subset of transform components into one kernel launch. Components left as None are skipped via empty arrays.

  • set_world_poses → delegates to _compose_fabric_transform(positions=..., orientations=...)
  • set_scales → delegates to _compose_fabric_transform(scales=...)
  • _sync_fabric_from_usd_once → single fused call with all three components

Additional fix

The topology-change invariant guard in _rebuild_fabric_arrays used assert, which is stripped under python -O. Replaced with raise RuntimeError so it's always active.

Tests

All 36 existing Fabric tests pass (+ 2 xfail).


Merge Order

This PR is part of a stacked series for Fabric-accelerated local poses:

  1. feat: add typed service locator to SimulationContext #5672 — service locator ← this depends on
  2. feat: add FabricStageCache service for shared hierarchy handles #5676 — FabricStageCache
  3. feat: add indexed fabric transform kernels for local/world matrix propagation #5675 — indexed fabric kernels
  4. refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory #5673 — factory dispatch
  5. refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform #5674 — fused compose ← this PR (depends on feat: add typed service locator to SimulationContext #5672)
  6. PR C+D — local poses (depends on all above)

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Summary

This PR successfully refactors FabricFrameView to consolidate duplicate kernel-launch logic into a single _compose_fabric_transform method and addresses a critical assertion bug. The changes are well-structured and improve both correctness and maintainability.


✅ Correctness

No blocking issues found.

The refactoring maintains behavioral equivalence:

  • _compose_fabric_transform correctly handles None parameters via empty arrays, preserving the kernel's ability to skip unspecified components
  • The _fabric_usd_sync_done flag is set correctly after writes
  • The assertRuntimeError change in _rebuild_fabric_arrays is a valuable fix — assertions are stripped under python -O, which could silently produce wrong poses or out-of-bounds kernel indices in production

Minor observation:

  • The _compose_fabric_transform method always calls _fabric_hierarchy.update_world_xforms() even when only scales are updated. This matches the original set_scales behavior, so no regression, but worth noting for future optimization.

⚡ Performance

Net positive impact:

  1. Reduced PrepareForReuse calls — The critical fix: _sync_fabric_from_usd_once now invokes PrepareForReuse exactly once (via the fused compose) instead of twice. This prevents potential topology-change signal masking.

  2. Single kernel launch for initial sync — USD→Fabric sync now uses one wp.launch instead of two, reducing kernel launch overhead.

Micro-optimization opportunity (non-blocking):

# Current: creates empty arrays every call
empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
empty4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device)

These could be cached as class-level constants, but the overhead is negligible for zero-sized arrays.


📝 Maintainability

Excellent improvements:

  1. Code deduplication — ~40 lines of duplicate kernel-launch boilerplate consolidated into one reusable method
  2. Clearer namingempty3/empty4 communicates intent better than dummy3/dummy4
  3. Good documentation — The _compose_fabric_transform docstring clearly explains the single-PrepareForReuse guarantee and the _fabric_usd_sync_done side effect
  4. Test modernizationwp.to_torch(fab_pos)fab_pos.torch uses the current ProxyArray API

Changelogs are well-formatted and accurately describe both the behavioral changes and the assert fix.


🔗 Service Locator (Bundled Feature)

The typed service locator in SimulationContext is a clean pattern:

  • Type-safe via TypeVar
  • Automatic close() on clear_instance()
  • Good test coverage for edge cases (re-registration, missing close method, etc.)

One consideration: set_service does not auto-close the replaced service (caller responsibility per docstring). This is documented but could be a subtle footgun. Consider adding an optional auto_close=True parameter in the future if replacement becomes common.


Verdict: Approve

The refactoring achieves its goals cleanly:

  • Eliminates duplicate code
  • Fixes a real bug (assert under -O)
  • Reduces PrepareForReuse call count
  • Adds useful infrastructure (service locator)

All 36 existing Fabric tests pass per PR description. Ship it! 🚀

@pv-nvidia pv-nvidia force-pushed the pv/fabric-fused-compose branch 2 times, most recently from 1b12c6c to aad8a83 Compare May 18, 2026 17:10
pv-nvidia added 2 commits May 20, 2026 13:53
…factory

FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.

Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView

FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).

UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
Extract _compose_fabric_transform() to deduplicate the kernel-launch
logic shared by set_world_poses and set_scales. The initial USD->Fabric
sync now composes position, orientation, and scale in one call, so
PrepareForReuse is invoked exactly once per logical update.

Also replace assert with RuntimeError in _rebuild_fabric_arrays so the
topology-change guard survives python -O.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-fused-compose branch from aad8a83 to b0206ce Compare May 20, 2026 14:11
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Summary

This PR successfully refactors FabricFrameView to consolidate duplicate kernel-launch logic into a single _compose_fabric_transform method and moves Fabric/USD dispatch to the FrameView factory. The changes improve both correctness and maintainability.


✅ Correctness

No blocking issues found.

The refactoring maintains behavioral equivalence:

  • _compose_fabric_transform correctly handles None parameters via empty arrays, preserving the kernel's ability to skip unspecified components
  • The _fabric_usd_sync_done flag is correctly set after writes
  • The assertRuntimeError change in _rebuild_fabric_arrays is a valuable fix — assertions are stripped under python -O, which could silently produce wrong poses or out-of-bounds kernel indices in production
  • Factory dispatch logic in FrameView._get_backend() correctly checks both fabricEnabled setting and device support before routing to FabricFrameView

Minor observation:

  • The _compose_fabric_transform method always calls _fabric_hierarchy.update_world_xforms() even when only scales are updated. This matches the original set_scales behavior, so no regression, but worth noting for future optimization.

⚡ Performance

Net positive impact:

  1. Reduced PrepareForReuse calls — The critical fix: _sync_fabric_from_usd_once now invokes PrepareForReuse exactly once (via the fused compose) instead of twice. This prevents potential topology-change signal masking.

  2. Single kernel launch for initial sync — USD→Fabric sync now uses one wp.launch instead of two, reducing kernel launch overhead.

Micro-optimization opportunity (non-blocking):

# Current: creates empty arrays every call
empty3 = wp.zeros((0, 3), dtype=wp.float32, device=self._device)
empty4 = wp.zeros((0, 4), dtype=wp.float32, device=self._device)

These could be cached as class-level constants, but the overhead is negligible for zero-sized arrays.


📝 Maintainability

Excellent improvements:

  1. Code deduplication — ~40 lines of duplicate kernel-launch boilerplate consolidated into one reusable method
  2. Single-responsibilityFabricFrameView no longer contains dispatch logic; it assumes Fabric is available (factory guarantees this)
  3. Clearer namingempty3/empty4 communicates intent better than dummy3/dummy4
  4. Good documentation — The _compose_fabric_transform docstring clearly explains the single-PrepareForReuse guarantee and the _fabric_usd_sync_done side effect
  5. Test modernizationwp.to_torch(fab_pos)fab_pos.torch uses the current ProxyArray API

Changelogs are well-formatted and accurately describe both the behavioral changes and the assert fix.


🏭 Factory Dispatch (frame_view.py)

The dispatch logic is clean and well-structured:

  • FrameView._get_backend() checks SimulationContext for physics manager type
  • PhysX path now explicitly checks fabricEnabled setting and device compatibility
  • UsdFrameView is eagerly registered since it lives in isaaclab, not a backend package
  • Good warning message when Fabric is enabled but device is unsupported

One nit (non-blocking): The _FABRIC_SUPPORTED_DEVICES tuple could potentially be shared between frame_view.py and any other code that needs it (currently duplicated between factory and docs), but this is minor.


⚠️ CI Status

The "Check changelog fragments" job is failing. Please verify the changelog fragment format matches project conventions.


Verdict

The refactoring achieves its goals cleanly:

  • ✅ Eliminates duplicate kernel-launch code
  • ✅ Fixes a real bug (assert stripped under -O)
  • ✅ Reduces PrepareForReuse call count in initial sync
  • ✅ Moves dispatch responsibility to the factory (single-responsibility principle)
  • ✅ All 36 existing Fabric tests pass per PR description

Once the changelog CI check passes, this is ready to merge. 🚀

@isaaclab-review-bot isaaclab-review-bot Bot mentioned this pull request May 20, 2026
7 tasks
kellyguo11 pushed a commit that referenced this pull request May 20, 2026
# Description

Removes the `cuda:0`-only restriction in `FabricFrameView`. USDRT
`SelectPrims` now accepts any CUDA device index, so Fabric acceleration
runs on the simulation device (e.g., `cuda:1`) instead of silently
falling back to the slower USD path. This unblocks distributed training
where each process is pinned to a specific GPU.

Changes:

- **Drop device allowlist.** Removes `_fabric_supported_devices`, the
device guard in `__init__`, and the corresponding assertion in
`_initialize_fabric`. Any CUDA device (or CPU) now works.
- **Multi-GPU test coverage.** Three `cuda:1`-parameterized tests gated
by `ISAACLAB_TEST_MULTI_GPU=1` env var, plus a dedicated CI workflow on
the multi-GPU runner that sets it.
- **Fix deprecated `wp.to_torch()` calls.** Replaced with `.torch`
accessor on ProxyArray (avoids DeprecationWarning).
- **TODOs for follow-up PRs.**: 
   - #5673 
   - #5674 

## Type of change

- New feature (non-breaking change which adds functionality)

`cuda:0` continues to work exactly as before; `cuda:1`+ now also works
instead of silently falling back to USD. No public API surface changed.

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

> Note: this PR uses a fragment file at
`source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst` per
the fragment-based changelog system.

## Test plan

Three new tests gated by `ISAACLAB_TEST_MULTI_GPU=1` and parameterized
with `["cuda:1"]`:

- `test_fabric_cuda1_world_pose_roundtrip` — `set_world_poses` →
`get_world_poses` returns the same values on a non-primary CUDA device.
- `test_fabric_cuda1_no_usd_writeback` — Fabric writes on `cuda:1` do
not write back to USD.
- `test_fabric_cuda1_scales_roundtrip` — covers the `set_scales` write
path on `cuda:1`.

A dedicated CI workflow (`test-fabric-multi-gpu.yaml`) runs on the
`[self-hosted, linux, x64, gpu, multi-gpu]` runner with
`ISAACLAB_TEST_MULTI_GPU=1` set. Pre-flights with `nvidia-smi` and
`torch.cuda.device_count()`, fails loudly if the runner has < 2 GPUs.

To verify locally on a multi-GPU machine:

```bash
ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \
    source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v
```

To verify the `cuda:0` path is unchanged (multi-GPU tests auto-skip):

```bash
./isaaclab.sh -p -m pytest \
    source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v
```
@pv-nvidia
Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

@pv-nvidia pv-nvidia closed this May 22, 2026
@pv-nvidia pv-nvidia reopened this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant